AITopics | ucb-based algorithm

Collaborating Authors

ucb-based algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Neural Information Processing SystemsDec-23-2025, 19:46:17 GMT

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logistic bandits that are used in problems with binary rewards: for instance, when the learner aims to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of $K+1\geq 2$ possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action $\mathbf{x}_t$, the user selects one of $K+1\geq 2$ outcomes, say outcome $i$, with a MNL probabilistic model with corresponding unknown parameter $\bar{\boldsymbol{\theta}}_{\ast i}$. Each outcome $i$ is also associated with a revenue parameter $\rho_i$ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret $\tilde{\mathcal{O}}(dK\sqrt{T})$ with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.

multinomial logistic regression bandit, name change, ucb-based algorithm, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Q-Learning with Fine-Grained Gap-Dependent Regret

Zhang, Haochen, Zheng, Zhong, Xue, Lingzhou

arXiv.org Machine LearningOct-9-2025

We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.

algorithm, equation, neural information processing system, (9 more...)

arXiv.org Machine Learning

2510.06647

Country:

North America > United States > Pennsylvania (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Neural Information Processing SystemsOct-9-2024, 15:15:44 GMT

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logistic bandits that are used in problems with binary rewards: for instance, when the learner aims to maximize the profit over a user that can select one of two possible outcomes (e.g., click' vs no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., click', show me later', never show again', no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of K 1\geq 2 possible outcomes ( 1 stands for the not click' outcome): we assume that for a learner's action \mathbf{x}_t, the user selects one of K 1\geq 2 outcomes, say outcome i, with a MNL probabilistic model with corresponding unknown parameter \bar{\boldsymbol{\theta}}_{\ast i} . Each outcome i is also associated with a revenue parameter \rho_i and the goal is to maximize the expected revenue.

click, multinomial logistic regression bandit, ucb-based algorithm, (3 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.40)
Research Report > Experimental Study (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Recommenadation aided Caching using Combinatorial Multi-armed Bandits

J, Pavamana K, Singh, Chandramani Kishore

arXiv.org Artificial IntelligenceMay-3-2024

We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increase cache hits. We formulate the cache hit optimization problem as a combinatorial multi-armed bandit (CMAB). We propose a UCB-based algorithm to decide which contents to cache and recommend. We provide an upper bound on the regret of our algorithm. We numerically demonstrate the performance of our algorithm and compare it to state-of-the-art algorithms.

algorithm, rec, recommendation, (16 more...)

arXiv.org Artificial Intelligence

2405.0008

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Industry: Telecommunications (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.60)

Add feedback